Data Guide
==============

This document provides a list of the raw input files used in this package, together with short descriptions of the data. All paths are relative to the root data folder referred to as PROJECT_PATH in the code. Please see the accompanying READMEs and code files for more details on how these data are deployed. Files marked with (†) are publicly available and are included in the replication packet. Files not marked with (†) are for commercially available data: while these cannot be included in the package, the data folder includes pseudo-data that illustrates the structure of the data.

## Morningstar Fund Data

The raw Morningstar holdings data come as a series of archives (`.7z/.zip/.gz`) containing XML files with the relevant holdings. The XML files come in two different structures, one used historically for earlier years of the data, and one used for more recent monthly deliveries. The data can be purchased from Morningstar. The files have the following naming convention:

1. `raw/morningstar/historical/[US/NonUS]_[FO/FM/FE]_[Active/Inactive]_[PERIOD].[7z/zip/gz]`
2. `raw/morningstar/monthly_new/[PERIOD]/[FO/FM/FE]_[DOMICILE]/*.xml.[7z/zip/gz]`

The historical archives contain holdings data for a given month, a particular fund category, a geographic region (US, non-US), and a given fund activity status. Each of these archives contains several XML files, one per portfolio, with the naming convention `[MasterPortfolioId].xml`, where `MasterPortfolioId` is the unique alphanumeric portfolio identifier used by Morningstar. The deliveries in the new format (`monthly_new`) contain archives for a given time period, fund category, and fund domicile country, and the XML files contained within are not at the individual MasterPortfolioId level (unlike in the `historical` format), but rather contain data for all relevant MasterPortfolioId values. The replication package includes a sample XML pseudo-data file for each of these two formats to illustrate the structure of the data.

The following additional files provide metadata on the funds in the Morningstar universe. Files 3 through 7 provide static fund-level information, with the year in the filename corresponding to the vintage of data delivery from Morningstar. File 8 provides a time series of monthly fund flows, obtained via the Morningstar Direct platform:

3. `raw/morningstar/mapping/MappingMorningstar_2018.xlsx`
4. `raw/morningstar/mapping/MorningstarUniverse_2020.xlsx`
5. `raw/morningstar/mapping/GlobalFOMorningstarMap_2021.csv`
6. `raw/morningstar/mapping/GlobalDeadMorningstarMap_2021.csv`
7. `raw/morningstar/mapping/NonFOGlobalMorningstarMap_2021.csv`
8. `raw/morningstar/flows/secid_all_monthly_flows.csv`

## S&P Global Insurance Data

The following files contain the quarterly security-level holdings of US insurers, which are reported to the National Association of Insurance Commissioners (NAIC). I use the version of the data provided by the division of S&P Global formerly known as SNL Financial, which can be obtained from S&P. The holdings data come in the form of `.xlsm` or `.xls` (Excel) files, one per insurer-quarter pair, covering all three insurance segments in the United States (life, health, property and casualty):

1. `raw/sp_insurance/life/[FIRM]-[DATE_Q].[xls/xlsm]`
2. `raw/sp_insurance/health/[FIRM]-[DATE_Q].[xls/xlsm]`
3. `raw/sp_insurance/pc/[FIRM]-[DATE_Q].[xls/xlsm]`

The following accompanying files contain a list of the individual insurers included in the data and a crosswalk from insurer names to S&P entity keys:

4. `raw/sp_insurance/Firms_List.xlsm`
5. `raw/sp_insurance/additional/sp_insurer_keys.xls`

The following files contain aggregated portfolio summary statistics for each of the individual insurers in the sample:

6. `raw/sp_insurance/additional/PortfolioSummary-*.xls`

Lastly, the following files provide aggregate, industry-level information on the balance sheet and income statements of each of the three insurance segments as a whole:

7. `raw/sp_insurance/additional/[INDUSTRY_SEGMENT] Industry Balance Sheet, Part *.xlsx`
8. `raw/sp_insurance/additional/[INDUSTRY_SEGMENT] Industry Income Statement, Part *.xlsx`

## Dealogic

The following files correspond to tables in the Dealogic Debt Capital Markets (DCM) database, providing information on bond issues. The data can be obtained commercially from Dealogic:

1. `raw/dealogic/Company.dta`
2. `raw/dealogic/CompanySICCodes.dta`
3. `raw/dealogic/CompanyNAICSCodes.dta`
4. `raw/dealogic/DCMDealTranches.dta`
5. `raw/dealogic/DCMDealTranchesValue.dta`
6. `raw/dealogic/DCMDealTranchesISINs.dta`
7. `raw/dealogic/DCMDealTranchesIssueCharacteristics.dta`
8. `raw/dealogic/DCMDealTranchesProceeds.dta`
9. `raw/dealogic/DCMDeal.dta`
10. `raw/dealogic/Frequency.dta`

## Moody's

The following files correspond to tables from the Moody's Default & Recovery Database (DRD), which contains the universe of credit ratings issued by Moody's as well as further information on the relevant set of securities. The data can be acquired commercially from Moody's:

1. `raw/moodys/MAST_ISSR.csv`
2. `raw/moodys/MAST_DEBT.csv`
3. `raw/moodys/DEBT_IDS.csv`
4. `raw/moodys/ISSR_IDS.csv`
5. `raw/moodys/DEBT_RATG.csv`
6. `raw/moodys/DFLT_HIST.csv`
7. `raw/moodys/DFLT_RCVRY_DEBT.csv`
8. `raw/moodys/MAST_DFLT.csv`

## IMF International Financial Statistics (IFS)

The following files are sourced from the IMF’s website and provide exchange rate data from the IMF’s International Financial Statistics (IFS) data, as well as basic country- and currency-level symbology files:

1. `raw/imf_ifs/ISO_currency.xls (†)`
2. `raw/imf_ifs/IMF_codes.xlsx (†)`
3. `raw/imf_ifs/IFS_ERdata.csv (†)`

## S&P Global Capital IQ

The following files contain data on bond securities from Capital IQ (S&P Global). Files 1 and 2 contain data on static bond characteristics, queried respectively using CUSIP and ISIN codes, while files 3 through 5 contain time-varying data on duration, amounts outstanding, and ratings. Files 6 through 9 contain bond prices spanning the four event windows studies in the paper:

1. `raw/ciq/CIQ-Characteristics-CUSIP.csv`
2. `raw/ciq/CIQ-Characteristics-ISIN.csv`
3. `raw/ciq/CIQ-Duration.csv`
4. `raw/ciq/CIQ-Amounts.csv`
5. `raw/ciq/CIQ-Ratings.csv`
6. `raw/ciq/CIQ-Prices-COVID.csv`
7. `raw/ciq/CIQ-Prices-GR.csv`
8. `raw/ciq/CIQ-Prices-P16.csv`
9. `raw/ciq/CIQ-Prices-P11.csv`

## Compustat

The following file contains data from the Compustat annual North America file, which can be obtained from WRDS or other Compustat data distributors:

1. `raw/compustat/compustat_northam_annual.dta`

## Factset

The following files correspond to tables from the  Factset Data Management Solutions (DMS) and Factset Debt Capital Structure (DCS) databases. These can be purchased directly from Factset: 

1. `raw/factset/dcs_details.dta`
2. `raw/factset/sym_cusip.dta`
3. `raw/factset/sym_sec_entity.dta`
4. `raw/factset/ent_entity_naics_rank.dta`

The following files contain zero-coupon sovereign bond price benchmark series from Factset, one per each country-term pair: 

5. `raw/factset/sovereign_benchmarks/USA_[Term]Y.xlsx`
6. `raw/factset/sovereign_benchmarks/CAN_[Term]Y.xlsx`
7. `raw/factset/sovereign_benchmarks/DEU_[Term]Y.xlsx`
8. `raw/factset/sovereign_benchmarks/GBR_[Term]Y.xlsx`

The following file provides a time series of assets under management for a sample of ETFs used in the Morningstar holdings data build, obtainable via queries through the Factset Workstation product:

9. `raw/factset/workstation/factset_etf_aum.xlsx`

## CRSP

The following files correspond to tables from the CRSP dataset. File 1 is the CRSP monthly stock file, file 2 contains the CRSP-Compustat crosswalk, and file 3 is the CRSP daily Treasury time series: 

1. `raw/crsp/crsp_monthly.dta`
2. `raw/crsp/crsp_compustat_link.dta`
3. `raw/crsp/tfz_dly_a.dta`

## Mergent

The following files correspond to tables from the Mergent FISD dataset, which provides information on bond issues:

1. `raw/mergent/fisd_issue_agents.dta`
2. `raw/mergent/fisd_agent.dta`
3. `raw/mergent/mergent_combined_issue.dta`

## TRACE

The following two files contain the transaction-level data from the TRACE Enhanced and TRACE Standard databases maintained by FINRA. The files contain cleaned versions of the data that apply the data cleaning steps in Dick-Nielsen (2014), which are common in the literature. These files can be obtained by running the SAS cleaning code included in the `clean_trace` directory in the present replication package on the WRDS server (e.g., via the WRDS SAS Studio at https://wrds-cloud.wharton.upenn.edu/SASStudio):

1. `raw/trace/trace_enhanced_clean.dta`
2. `raw/trace/trace_standard_clean.dta`

The code additionally makes use of the following file, which is the corporate issues masterfile from TRACE: 

3. `raw/trace/trace_corp_master.dta`

## CUSIP Global Services (CGS)

The following raw data is from CUSIP Global Services (CGS, part of S&P). These are security- and issuer-level master files for global CUSIP-bearing securities. These files can be obtained commercially from CGS:

1. `raw/cgs/INCMSTR.PIP`
2. `raw/cgs/CPMASTER_ATTRIBUTE.PIP`
3. `raw/cgs/CPMASTER_ISSUE.PIP`
4. `raw/cgs/CPMASTER_ISSUER.PIP`
5. `raw/cgs/ALLCNPMASTER_ISIN.PIP`
6. `raw/cgs/ACMD*.PIP`
7. `raw/cgs/ALLCNPMASTER_ISSUER.PIP`
8. `raw/cgs/FFAPlusMASTER.PIP`
9. `raw/cgs/TBA Master File - Sept 2012 Rev.txt`
10. `raw/cgs/AIMASTER.PIP`
11. `raw/cgs/CBRLEIMSTR.PIP`
12. `raw/cgs/master_20131211.GM`
13. `raw/cgs/issue_20170912.GM`
14. `raw/cgs/master_20100512.SB`
15. `raw/cgs/issue_20081208.SB`
16. `raw/cgs/master_20160809.FM`
17. `raw/cgs/issue_20160809.FM`
18. `raw/cgs/master_20160815.FD`
19. `raw/cgs/issue_upd_20180718.FD`
20. `raw/cgs/master_20061206.IB`
21. `raw/cgs/issue_20061206.IB`

Items 1 through 8 provide data from the CUSIP_db, ISIN_db, and CINS_db master file products from CGS. Item 6 (`ACMD*`) is split across multiple files with the same structure in the data delivery from CGS, and a sample pseudo-dataset is included in the replication package. Items 8 and 9 provide data on 144a and TBA issues, respectively. File 10 is the CGS Associated Issuers master file, which provides information about relationships among various CUSIP6 issuer numbers – this is useful in establishing which CUSIP6 numbers belong to the same issuing entities. Item 11 constitutes the CGS LEI Plus mapping data product by CGS. Items 12 through 21 contain information on agency and sovranational issues.

## Coppola et al. (2021) Data

The following files are from Coppola et al. ("Redrawing the Map of Global Capital Flows", Quarterly Journal of Economics, 2021), and the code used to build them can be found in the replication package for that paper (available at https://doi.org/10.7910/DVN/1SCTXG). File 1 contains the Coppola et al. (2021) ultimate parent aggregation dataset, which links affiliates to their corporate parents using CUSIP6 codes. Files 2 and 3 correspond to the Coppola et al. (2021) security masterfile, which contains static security-level characteristics information sourced and consolidated from Dealogic, Factset, and S&P – the two versions of the files are mergeable using CUSIP and ISIN codes, respectively:

1. `raw/cmns/cmns_aggregation.dta`
2. `raw/cmns/gcap_security_master_cusip.dta`
3. `raw/cmns/gcap_security_master_isin.dta`

## Additional Data

The following files contain specifications of the NAICS and GICS industry code structures, as well as a crosswalk between the SIC and NAICS classifications: 

1. `raw/industry_codes/sic_naics_crosswalk.csv (†)`
2. `raw/industry_codes/naics_structure.xlsx (†)`
3. `raw/industry_codes/gics_structure.xls (†)`

The following files provide crosswalks between ISO2 and ISO3 currency codes and between Morningstar typecodes and asset class definitions, respectively:

4. `raw/concordances/iso2_iso3.dta (†)`
5. `raw/concordances/morningstar_typecodes.dta (†)`

The following file contains the time series for the Gilchrist and Zakrajšek (2012) credit spread:

6. `raw/gz_credit_spread/ebp_csv.csv (†)`

The following file corresponds to the WRDS Bond Returns table, a consolidated product produced by the WRDS research team which can obtained from WRDS:

7. `raw/wrds/wrds_bond_returns.dta`

The following file contains SIC industry codes mapped to CUSIP6 identifiers and consolidated using information from Compustat and Capital IQ. The file is built by Maggiori et al. ("International Currencies and Capital Allocation", Journal of Political Economy, 2020), and the code used to build it can be found in the replication package for that paper (https://doi.org/10.1086/705688):

8. `raw/mns_industry/compustat_sic_merge.dta`

The following file contains security-level information obtained from Bloomberg’s OpenFIGI API data, which is used as part of the Morningstar holdings data build to map the externalid field in the raw Morningstar data to consolidated securities identifiers:

9. `raw/externalid/bbg_figi.csv`
